A Geographic Information System (GIS) is a computer system that analyzes and displays geographically referenced information. It uses data that is attached to a unique location. - www.usgs.gov/faqs/what-a-geographic-information-system-gis
Geocoding is the process of transforming location information (e.g., street addresses, place names, etc.) into geocoordinate data (e.g., latitude/longitude) that can be plotted and symbolized on a map. - https://guides.library.brandeis.edu/c.php?g=301754&p=7102006
import pandas as pd
import os
from geocodio import GeocodioClient
Read in NYCHA address data
fn_nycha_address = 'Address Guide 02-01-2021.xlsx'
df_nycha_address = pd.read_excel(os.path.join('../ref_data', fn_nycha_address), dtype=str, sheet_name='Sheet1')
df_nycha_address.head()
| BOROUGH | HOUSE # | STREET | ADDRESS | ZIP CODE | DEVELOPMENT | MANAGED BY | COMMUNITY DISTRICT# | BLOCK | LOT | BIN# | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | BRONX | 1 | CLASON POINT LANE SOUTH | 1 CLASON POINT LANE SOUTH | 10473 | CLASON POINT GARDENS | SACK WERN | 9 | 3633 | 1 | 2092744 |
| 1 | BRONX | 1 | FORDHAM PLAZA | 1 FORDHAM PLAZA | 10458 | NON DEVELOPMENT FACILITY | RENTED PROPERTY | 6 | 3033 | 53 | 2088325 |
| 2 | BRONX | 10 | CLASON POINT LANE NORTH | 10 CLASON POINT LANE NORTH | 10473 | CLASON POINT GARDENS | SACK WERN | 9 | 3633 | 1 | 2092734 |
| 3 | BRONX | 1000 | ROSEDALE AVENUE | 1000 ROSEDALE AVENUE | 10472 | SOTOMAYOR HOUSES | SOTOMAYOR HOUSES | 9 | 3725 | 1 | 2123862 |
| 4 | BRONX | 1000 | TRINITY AVENUE | 1000 TRINITY AVENUE | 10456 | FOREST | FOREST | 3 | 2639 | 1 | 2091961 |
df_nycha_address.shape
(5401, 11)
df_nycha_address[df_nycha_address['ZIP CODE'].isnull()]
| BOROUGH | HOUSE # | STREET | ADDRESS | ZIP CODE | DEVELOPMENT | MANAGED BY | COMMUNITY DISTRICT# | BLOCK | LOT | BIN# | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 3289 | MANHATTAN | NaN | NORTH BED OF MONROE STREET | NORTH BED OF MONROE STREET | NaN | VLADECK II | VLADECK | 3 | 263 | 1 | NaN |
We have one null value for ZIP code - we'll want to get rid of that
df_nycha_address = df_nycha_address[df_nycha_address['ZIP CODE'].notnull()]
Let's concatenate Address and ZIP code together
df_nycha_address['address_string'] = df_nycha_address.apply(lambda x: f"{x['ADDRESS']}, {x['ZIP CODE']}", axis=1)
Don't leave your personal token in your production script!
You can create a free account with Geocodio here to get a token: https://dash.geocod.io/register
# substitute in your token here
client = GeocodioClient('XXXXXXX')
Geocodio allows 2,500 free lookups per day - we'll take a sample of 25 so we don't blow it up
df_sample = df_nycha_address.sample(25)
Now we'll call the geocoding API
https://www.geocod.io/docs/?python#single-address
df_sample['gc'] = df_sample['address_string'].apply(lambda x: client.geocode(x))
Draw out latitude & longitude into their own fields
df_sample['lat'] = df_sample['gc'].apply(lambda x: x['results'][0]['location']['lat'])
df_sample['lon'] = df_sample['gc'].apply(lambda x: x['results'][0]['location']['lng'])
df_sample
| BOROUGH | HOUSE # | STREET | ADDRESS | ZIP CODE | DEVELOPMENT | MANAGED BY | COMMUNITY DISTRICT# | BLOCK | LOT | BIN# | address_string | gc | lat | lon | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2581 | BROOKLYN | 429 | DUMONT AVENUE | 429 DUMONT AVENUE | 11212 | VAN DYKE I | VAN DYKE I | 16 | 3777 | 1 | 3328046 | 429 DUMONT AVENUE, 11212 | {'input': {'address_components': {'number': '4... | 40.665387 | -73.903596 |
| 823 | BRONX | 393 | EAST 155TH STREET | 393 EAST 155TH STREET | 10455 | POLICE SERVICE AREA #7 | NYPD HOUSING BUREAU | 1 | 2402 | 17 | 393 EAST 155TH STREET, 10455 | {'input': {'address_components': {'number': '3... | 40.820374 | -73.916421 | |
| 4089 | MANHATTAN | 416 | EAST 105TH STREET | 416 EAST 105TH STREET | 10029 | EAST RIVER | EAST RIVER | 11 | 1696 | 1 | 1083946 | 416 EAST 105TH STREET, 10029 | {'input': {'address_components': {'number': '4... | 40.788331 | -73.940376 |
| 2649 | BROOKLYN | 472 | MARCY AVENUE | 472 MARCY AVENUE | 11206 | MARCY | MARCY | 3 | 1719 | 1 | 3325585 | 472 MARCY AVENUE, 11206 | {'input': {'address_components': {'number': '4... | 40.698714 | -73.950816 |
| 5131 | QUEENS | 67-01 | KISSENA BOULEVARD | 67-01 KISSENA BOULEVARD | 11367 | POMONOK | POMONOK | 8 | 6792 | 30 | 4455430 | 67-01 KISSENA BOULEVARD, 11367 | {'input': {'address_components': {'number': '6... | 40.734721 | -73.814313 |
| 3103 | BROOKLYN | 77 | VANDALIA AVENUE | 77 VANDALIA AVENUE | 11239 | VANDALIA AVENUE | PENNSYLVANIA-WORTMAN | 5 | 4431 | 100 | 3348860 | 77 VANDALIA AVENUE, 11239 | {'input': {'address_components': {'number': '7... | 40.650426 | -73.886568 |
| 4150 | MANHATTAN | 447 | WEST 25TH STREET | 447 WEST 25TH STREET | 10001 | ELLIOTT | CHELSEA | 4 | 723 | 1 | 1083579 | 447 WEST 25TH STREET, 10001 | {'input': {'address_components': {'number': '4... | 40.748728 | -74.002094 |
| 4810 | QUEENS | 219-30 | 102ND AVENUE | 219-30 102ND AVENUE | 11429 | FHA REPOSSESSED HOUSES (GROUP V) | BAISLEY PARK | 13 | 10772 | 21 | 4230889 | 219-30 102ND AVENUE, 11429 | {'input': {'address_components': {'number': '2... | 40.713633 | -73.734621 |
| 2940 | BROOKLYN | 621 | PARK AVENUE | 621 PARK AVENUE | 11206 | MARCY | MARCY | 3 | 1719 | 1 | 3325594 | 621 PARK AVENUE, 11206 | {'input': {'address_components': {'number': '6... | 40.697989 | -73.951125 |
| 2976 | BROOKLYN | 651 | EAST 108TH STREET | 651 EAST 108TH STREET | 11236 | BREUKELEN | BREUKELEN | 18 | 8158 | 1 | 3331299 | 651 EAST 108TH STREET, 11236 | {'input': {'address_components': {'number': '6... | 40.651141 | -73.894502 |
| 3594 | MANHATTAN | 2 | WEST 115TH STREET | 2 WEST 115TH STREET | 10026 | KING TOWERS | KING TOWERS | 10 | 1596 | 1 | 1083316 | 2 WEST 115TH STREET, 10026 | {'input': {'address_components': {'number': '2... | 40.799912 | -73.947447 |
| 4686 | QUEENS | 144-23 | 166TH STREET | 144-23 166TH STREET | 11434 | FHA REPOSSESSED HOUSES (GROUP III) | BAISLEY PARK | 13 | 13272 | 54 | 4284945 | 144-23 166TH STREET, 11434 | {'input': {'address_components': {'number': '1... | 40.665600 | -73.777202 |
| 1784 | BROOKLYN | 185 | GOLD STREET | 185 GOLD STREET | 11201 | FARRAGUT | FARRAGUT | 2 | 90 | 1 | 3325920 | 185 GOLD STREET, 11201 | {'input': {'address_components': {'number': '1... | 40.699272 | -73.982684 |
| 4566 | QUEENS | 108-60 | 159TH STREET | 108-60 159TH STREET | 11433 | SOUTH JAMAICA I | SOUTH JAMAICA I | 12 | 10146 | 51 | 4457473 | 108-60 159TH STREET, 11433 | {'input': {'address_components': {'number': '1... | 40.693830 | -73.793401 |
| 4939 | QUEENS | 40-04 | 10TH STREET | 40-04 10TH STREET | 11101 | QUEENSBRIDGE NORTH | QUEENSBRIDGE NORTH | 1 | 470 | 100 | 4433382 | 40-04 10TH STREET, 11101 | {'input': {'address_components': {'number': '4... | 40.756060 | -73.943544 |
| 1184 | BRONX | 875 | TAYLOR AVENUE | 875 TAYLOR AVENUE | 10473 | MONROE | MONROE | 9 | 3637 | 1 | 2092754 | 875 TAYLOR AVENUE, 10473 | {'input': {'address_components': {'number': '8... | 40.822949 | -73.863679 |
| 4191 | MANHATTAN | 50 | EAST 106TH STREET | 50 EAST 106TH STREET | 10029 | CARVER | CARVER | 11 | 1610 | 23 | 1078840 | 50 EAST 106TH STREET, 10029 | {'input': {'address_components': {'number': '5... | 40.793153 | -73.949269 |
| 1692 | BROOKLYN | 1761 | STERLING PLACE | 1761 STERLING PLACE | 11233 | HOWARD AVENUE-PARK PLACE | PARK ROCK CONSOLIDATED | 16 | 1466 | 49 | 3039386 | 1761 STERLING PLACE, 11233 | {'input': {'address_components': {'number': '1... | 40.671041 | -73.920315 |
| 2440 | BROOKLYN | 353 | BERRIMAN STREET | 353 BERRIMAN STREET | 11208 | FHA REPOSSESSED HOUSES (GROUP IX) | BAISLEY PARK | 5 | 4070 | 19 | 3090234 | 353 BERRIMAN STREET, 11208 | {'input': {'address_components': {'number': '3... | 40.669834 | -73.878656 |
| 4324 | MANHATTAN | 666 | WATER STREET | 666 WATER STREET | 10002 | VLADECK | VLADECK | 3 | 260 | 75 | 1083388 | 666 WATER STREET, 10002 | {'input': {'address_components': {'number': '6... | 40.712336 | -73.981766 |
| 2067 | BROOKLYN | 265 | LIVONIA AVENUE | 265 LIVONIA AVENUE | 11212 | TILDEN | TILDEN | 16 | 3576 | 1 | 3326546 | 265 LIVONIA AVENUE, 11212 | {'input': {'address_components': {'number': '2... | 40.663083 | -73.908307 |
| 1568 | BROOKLYN | 1636 | PARK PLACE | 1636 PARK PLACE | 11233 | CROWN HEIGHTS | PARK ROCK CONSOLIDATED | 8 | 1375 | 31 | 3036637 | 1636 PARK PLACE, 11233 | {'input': {'address_components': {'number': '1... | 40.671831 | -73.923213 |
| 5078 | QUEENS | 50-31 | NEWTOWN ROAD | 50-31 NEWTOWN ROAD | 11377 | WOODSIDE | WOODSIDE | 1 | 736 | 2 | 4430993 | 50-31 NEWTOWN ROAD, 11377 | {'input': {'address_components': {'number': '5... | 40.754003 | -73.910627 |
| 1322 | BROOKLYN | 111 | NOSTRAND AVENUE | 111 NOSTRAND AVENUE | 11206 | MARCY | MARCY | 3 | 1738 | 1 | 3324242 | 111 NOSTRAND AVENUE, 11206 | {'input': {'address_components': {'number': '1... | 40.696083 | -73.952342 |
| 3578 | MANHATTAN | 1940 | LEXINGTON AVENUE | 1940 LEXINGTON AVENUE | 10035 | UPACA (SITE 6) | ROBINSON | 11 | 1768 | 12 | 1054368 | 1940 LEXINGTON AVENUE, 10035 | {'input': {'address_components': {'number': '1... | 40.801012 | -73.940138 |
import requests
import json
import plotly.graph_objects as go
Read in ZCTA COVID data and get it into proper shape (sorry, this isn't a pandas demo!)
Source: https://github.com/nychealth/coronavirus-data/blob/master/trends/percentpositive-by-modzcta.csv
url_nyc_zcta = 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/trends/percentpositive-by-modzcta.csv'
df_nyc_zcta = pd.read_csv(url_nyc_zcta)
df_nyc_zcta['week_ending'] = pd.to_datetime(df_nyc_zcta['week_ending'])
df_nyc_zcta.set_index('week_ending', inplace=True)
df_nyc_zcta = df_nyc_zcta.iloc[[-1]]
df_nyc_zcta.rename(columns=lambda x: x.split('_')[-1], inplace=True)
zcta_data = df_nyc_zcta.T.reset_index()
map_week = zcta_data.columns.values[1]
zcta_data.rename(columns={'index':'geoid', map_week: 'pct_pos'}, inplace=True)
zcta_data['pct_pos'] = zcta_data['pct_pos']/100
zcta_data
| week_ending | geoid | pct_pos |
|---|---|---|
| 0 | CITY | 0.0281 |
| 1 | BX | 0.0265 |
| 2 | BK | 0.0330 |
| 3 | MN | 0.0202 |
| 4 | QN | 0.0308 |
| ... | ... | ... |
| 178 | 11691 | 0.0468 |
| 179 | 11692 | 0.0230 |
| 180 | 11693 | 0.0223 |
| 181 | 11694 | 0.0292 |
| 182 | 11697 | 0.0419 |
183 rows × 2 columns
Read in ZCTA geojson file
Source: https://github.com/nychealth/coronavirus-data/tree/master/Geography-resources
map_url = 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/Geography-resources/MODZCTA_2010_WGS1984.geo.json'
zcta_map = requests.get(map_url).json()
We'll use mapbox for some extra formatting control - it's not strictly necessary
Otherwise, you might get some display wonkiness if you leave the default map style - I think it's a known bug, and will probably get fixed sooner or later
You can create a free Mapbox account here for a token: https://account.mapbox.com/auth/signup/
# substitute in your token here
mapbox_token = 'XXXXXXXXX'
Set plot dimensions
w = 800
h = 600
Set up choropleth map object
You get lots of parameters to play with
choro = go.Choroplethmapbox(geojson=zcta_map,
featureidkey='properties.MODZCTA',
locations=zcta_data['geoid'],
z=zcta_data['pct_pos'],
colorscale="Reds",
marker_opacity=0.9,
colorbar_tickformat='%',
colorbar_title = "Test Positivity"
)
Set up scatterplot object to overlay on top of choropleth
scatter = go.Scattermapbox(lat=df_sample['lat'],
lon=df_sample['lon'],
mode='markers+text',
text=df_sample['DEVELOPMENT'],
marker=dict(size=8, color='blue')
)
Set layout parameters
layout = go.Layout(
# mapbox_style="carto-positron", # this is a default, which didn't display properly when I tried
mapbox_style='mapbox://styles/andrewfair/ckmlj4e4e0drx17rvxslm7kzi', # you would substitute your style here
mapbox_accesstoken=mapbox_token,
margin={"r":0,"t":50,"l":0,"b":0},
mapbox_zoom=9.36,
mapbox_center = {"lat": 40.7128, "lon": -74.0060},
title_text='<b>COVID-19 Test Percent Positivity</b><br>Citywide by ZIP Code, Week Ending {}'.format(map_week.strftime('%m/%d/%Y')),
coloraxis_colorbar_tickformat='%',
width=w,
height=h
)
Now, plot it!
fig = go.Figure(data=[choro, scatter], layout=layout)
fig.show()